Skip to content

Add t-digest#12961

Merged
rongrong merged 4 commits intoprestodb:masterfrom
gastonar:t_digest
Jul 20, 2019
Merged

Add t-digest#12961
rongrong merged 4 commits intoprestodb:masterfrom
gastonar:t_digest

Conversation

@gastonar
Copy link
Copy Markdown
Contributor

@gastonar gastonar commented Jun 17, 2019

Add t-digest to Presto, which allows faster and more accurate results when calculating quantiles while saving space in memory. In reference to issue #12929.

Note

We make intentional decision to just copy the existing implementation for now. See #12961 (review) and #12961 (comment)

@gastonar gastonar changed the title Add t-digest [WIP]: Add t-digest Jun 17, 2019
@gastonar gastonar mentioned this pull request Jun 17, 2019
@gastonar gastonar force-pushed the t_digest branch 2 times, most recently from 56a6498 to 2fe3004 Compare June 18, 2019 19:15
@gastonar gastonar changed the title [WIP]: Add t-digest Add t-digest Jun 18, 2019
@gastonar gastonar requested a review from tdcmeehan June 18, 2019 19:18
Copy link
Copy Markdown
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments. I would also move these classes into their own package for now, com.facebook.presto.operator.scalar.tdigest

Comment thread presto-main/src/main/java/com/facebook/presto/operator/scalar/Centroid.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/operator/scalar/ScaleFunction.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/operator/scalar/ScaleFunction.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/operator/scalar/TDigest.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/operator/scalar/TDigest.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/operator/scalar/MergingDigest.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/operator/scalar/MergingDigest.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/operator/scalar/MergingDigest.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/operator/scalar/Sort.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/operator/scalar/Sort.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/operator/scalar/MergingDigest.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/operator/scalar/MergingDigest.java Outdated
Copy link
Copy Markdown
Contributor

@rongrong rongrong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I don't think changing variable names are necessary (yes, longer names are more readable, but for algorithms like this readable names is not that critical to understanding it anyways. I feel safer to see they are not changed from the original. Just personal opinion.), also it's not clear to me why a lot of comments are removed. Reducing these would make reviewing easier.

Comment thread presto-main/src/main/java/com/facebook/presto/operator/scalar/Centroid.java Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit strange to see hashCode and equals not based on the same variables. What's the reason behind this?

Copy link
Copy Markdown
Contributor Author

@gastonar gastonar Jun 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each instance of a centroid has a unique id, even if the centroid contains the same values. Therefore, if we compare by id, it would always return false.

Comment thread presto-main/src/main/java/com/facebook/presto/operator/scalar/MergingDigest.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/operator/scalar/MergingDigest.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/operator/scalar/MergingDigest.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/operator/scalar/MergingDigest.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/operator/scalar/MergingDigest.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/operator/scalar/MergingDigest.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/operator/scalar/Centroid.java Outdated
@gastonar gastonar force-pushed the t_digest branch 3 times, most recently from 2cfea70 to 078308b Compare June 19, 2019 23:19
@jessesleeping jessesleeping self-requested a review June 20, 2019 02:04
Comment thread presto-main/src/main/java/com/facebook/presto/operator/scalar/Centroid.java Outdated
@gastonar gastonar force-pushed the t_digest branch 5 times, most recently from 5c3169a to 532fb28 Compare June 21, 2019 19:36
Comment thread presto-main/docs/t-digest.md Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/tdigest/TDigest.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/tdigest/TDigest.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/tdigest/TDigest.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/tdigest/Centroid.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/tdigest/TDigestUtils.java Outdated
Comment thread presto-main/src/test/java/com/facebook/presto/tdigest/BenchmarkTDigest.java Outdated
Comment thread presto-main/src/test/java/com/facebook/presto/tdigest/BenchmarkTDigest.java Outdated
Comment thread presto-main/src/test/java/com/facebook/presto/tdigest/TestTDigest.java Outdated
Comment thread presto-main/docs/t-digest.md Outdated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to update png files? Or we can just strip them out

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the image makes it a lot easier to understand and is consistent with QuantileDigest documentation, so I think we should keep them.

@highker
Copy link
Copy Markdown

highker commented Jul 11, 2019

there is one commit with a wrong email <50886172+gastonar@users.noreply.github.com>

@highker
Copy link
Copy Markdown

highker commented Jul 11, 2019

Chatted offline; we will leave most names unchanged.

@gastonar gastonar force-pushed the t_digest branch 3 times, most recently from b83e7a6 to 9a23de4 Compare July 11, 2019 17:53
@highker highker self-requested a review July 11, 2019 18:25
Copy link
Copy Markdown
Contributor

@jessesleeping jessesleeping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the implementation from Ted Dunning is correct as I didn't look into the actual logic of the t-digest implementation. The part that wiring it to Presto looks good to me.

Copy link
Copy Markdown

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Commit title "Add serialization documents" -> "Add tdigest serialization documents" maybe.
  • Can you fix your email address for commit "Add NOTICES file"?

Copy link
Copy Markdown
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments after a quick glance.

Comment thread presto-main/src/main/java/com/facebook/presto/tdigest/AbstractTDigest.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/tdigest/AbstractTDigest.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/tdigest/AbstractTDigest.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/tdigest/AbstractTDigest.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/tdigest/Centroid.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/tdigest/Centroid.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/tdigest/Sort.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/tdigest/TDigest.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/tdigest/ScaleFunction.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/tdigest/Sort.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/tdigest/TDigest.java Outdated
@wenleix
Copy link
Copy Markdown
Contributor

wenleix commented Jul 11, 2019

Per offline discussion, it's an intention decision to first use existing solution as it is.

@wenleix
Copy link
Copy Markdown
Contributor

wenleix commented Jul 12, 2019

@rongrong :

Personally I don't think changing variable names are necessary (yes, longer names are more readable, but for algorithms like this readable names is not that critical to understanding it anyways. I feel safer to see they are not changed from the original. Just personal opinion.), also it's not clear to me why a lot of comments are removed. Reducing these would make reviewing easier.

I agree for the initial commit we can keep it as a copy. Although I personally prefer to copy it into a different project (e.g. presto-digest), instead of presto-main. 😃

But since @tdcmeehan volunteered to move them to a different project right after this, let's merge as it is 😃

Copy link
Copy Markdown
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't looked into thoroughly. But since it's majorly a copy of another library, and I believe this already gets compressively reviewed.

Screen Shot 2019-07-11 at 5 06 49 PM

Comment thread presto-main/src/main/java/com/facebook/presto/tdigest/TDigest.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/tdigest/TDigest.java Outdated
Copy link
Copy Markdown
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Add implementation of T-Digest"

Maybe also add commit message like "This commit copied the original code as it is intentionally. Refactored will be done in future commits"

This commit copied the original code as it is intentionally. Refactoring will be done in future commits.
@gastonar gastonar force-pushed the t_digest branch 4 times, most recently from b44c56d to 3aabb13 Compare July 12, 2019 01:39
@tdcmeehan
Copy link
Copy Markdown
Contributor

There's some copy paste in the graffle metadata. Other than that I think this is good to merge @rongrong

Comment thread presto-main/src/main/java/com/facebook/presto/tdigest/TDigestUtils.java Outdated
Comment thread presto-main/src/main/java/com/facebook/presto/tdigest/docs/tdigest.graffle Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants